Overfitting Reduction of Text Classification Based on AdaBELM

نویسندگان

Xiaoyue Feng

Yanchun Liang

Xiaohu Shi

Dong Xu

Xu Wang

Renchu Guan

چکیده

Overfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO) and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

Two Step POS Selection for SVM Based Text Categorization

Although many researchers have verified the superiority of Support Vector Machine (SVM) on text categorization tasks, some recent papers have reported much lower performance of SVM based text categorization methods when focusing on all types of parts of speech (POS) as input words and treating large numbers of training documents. This was caused by the overfitting problem that SVM sometimes sel...

متن کامل

Mining User Requirements from Application Store Reviews Using Frame Semantics

Mining user requirements from application store reviews using frame semantics N. Jha and A. Mahmoud, Requirements Engineering: Foundation for Software Quality (REFSQ), accepted, 2017 Papers Research on mining user reviews in mobile application stores has noticeably advanced in the past few years. The majority of the proposed techniques rely on classifying the textual description of user reviews...

متن کامل

A Domain Adaptation Regularization for Denoising Autoencoders

Finding domain invariant features is critical for successful domain adaptation and transfer learning. However, in the case of unsupervised adaptation, there is a significant risk of overfitting on source training data. Recently, a regularization for domain adaptation was proposed for deep models by (Ganin and Lempitsky, 2015). We build on their work by suggesting a more appropriate regularizati...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Entropy

دوره 19 شماره

صفحات -

تاریخ انتشار 2017

Overfitting Reduction of Text Classification Based on AdaBELM

نویسندگان

چکیده

منابع مشابه

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

Two Step POS Selection for SVM Based Text Categorization

Mining User Requirements from Application Store Reviews Using Frame Semantics

A Domain Adaptation Regularization for Denoising Autoencoders

عنوان ژورنال:

اشتراک گذاری